🐿️ ScourBrowse
LoginSign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🧠 LLM Inference

Quantization, Attention Mechanisms, Batch Processing, KV Caching

How Fast is Algorithmic Progress in AI Inference?
lesswrong.com·10h
📊Model Serving Economics
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning
arxiv.org·1h
🏆LLM Benchmarking
Spending Inference Time
kevinlu.ai·15h·
Discuss: Hacker News
📊Model Serving Economics
Phi-4-mini-flash-reasoning Model: Redefining AI Efficiency
pub.towardsai.net·18h
📱Edge AI Optimization
From the Tensor to the Transformer: Building the AI stack from first principles
github.com·16h·
Discuss: Hacker News
⚡Hardware Acceleration
KG-Attention: Knowledge Graph-Guided Attention at Test-Time via Bidirectional Information Aggregation
arxiv.org·1h
🔍Information Retrieval
How much attention do you need, really? Experiments in O(1) task completion
notion.so·12h·
Discuss: Hacker News
🔢BitNet Inference
Fine-tuning / RL post training for tool calling
arxiv.org·2h·
Discuss: r/LocalLLaMA
🪄Prompt Engineering
Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
machinelearning.apple.com·5h
🗜️Vector Compression
A Comprehensively Adaptive Architectural Optimization-Ingrained Quantum Neural Network Model for Cloud Workloads Prediction
arxiv.org·1h
📱Edge AI Optimization
The AI expertise conundrum
vaughntan.org·12h
🏆LLM Benchmarking
Graph Convolutional Branch and Bound
arxiv.org·1h
📉Embeddings Optimization
What Factors Affect LLMs and RLLMs in Financial Question Answering?
arxiv.org·1h
🏆LLM Benchmarking
Training an LLM only on books from the 1800's - no modern bias
github.com·3h·
Discuss: r/LocalLLaMA
🏆LLM Benchmarking
Conditional probability is the single most important concept in statistics.
threadreaderapp.com·12h
📊Statistical Ranking
Mechanistic Indicators of Understanding in Large Language Models
arxiv.org·1h
🔍AI Interpretability
Hill Space: Neural nets that do perfect arithmetic (to 10⁻¹⁶ precision)
hillspace.justindujardin.com·22h·
Discuss: Hacker News
🔢BitNet
Differentiable Programming for Learnable Graphs: Optimizing LLM Workflows W DSPy
viksit.substack.com·6h·
Discuss: Substack
🔄Incremental Computation
BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis
arxiv.org·1h
🗜️Zstd
Computing embeddings offline for Gemma 3 1B (on-device model)
ai.google.dev·7h·
Discuss: r/LocalLLaMA
📱Edge AI Optimization
Loading...Loading more...
AboutBlogChangelogRoadmap